Back

International Journal of Medical Informatics

25 training papers 2019-06-25 – 2026-03-07

Top medRxiv preprints most likely to be published in this journal, ranked by match strength.

1
Evaluating large language models for natural-language-to-code generation on aggregate Czech public health data analysis
2025-12-11 health informatics 10.64898/2025.12.05.25341697
#1 (6.4%)
Show abstract

Large language models (LLMs) are increasingly explored as tools for healthcare research and data analysis. However, their applicability to structured public health datasets, especially in non-English contexts, remains underexamined. We systematically evaluated 11 state-of-the-art LLMs on their ability to generate executable Python code for analytical queries over Czech public health datasets, focusing on incidence and prevalence data provided by the National Health Information Portal (known as N...

2
Classifying polyneuropathy and myopathy patients on Electronic Health Records
2025-12-12 health informatics 10.64898/2025.12.11.25342051
#1 (6.0%)
Show abstract

BackgroundRare neuromuscular diseases such as polyneuropathy (PN) and myopathy (MY) often share symptomatic characteristics, leading to diagnostic challenges and delays. Machine learning applied to routine care data of electronic health records (EHRs) offers the potential for accelerating accurate diagnosis. ObjectiveTo develop and evaluate machine learning models to distinguish between patients with PN and MY using EHR data, as a step toward tools that could support improved diagnostic process...

3
Leveraging NLP to Identify Domain-Specific Variables in Large-Scale Cohort Metadata: A Sleep Use Case
2026-01-19 health informatics 10.64898/2026.01.18.26344317
Top 0.1% (5.8%)
Show abstract

Public health policies increasingly rely on the use of complex and large datasets containing heterogeneous, multimodal data that require advanced analytical methods to extract meaningful insights and support evidence-based decision-making. Essential for the sharing and analysis of public health data is the description of the data ("data about data") or metadata. Indeed, a lack of metadata standards has been identified as a key technical barrier to public health data sharing 1. Metadata varies c...

4
Improvement in Albuminuria Screening Associated with EHR Decision Support Change
2026-02-14 health informatics 10.64898/2026.02.09.26345709
Top 0.2% (5.4%)
Show abstract

BackgroundAlbuminuria is associated with increased risk of cardiovascular disease (CVD), heart failure, and progression of chronic kidney disease (CKD). Early detection of albuminuria, done through spot urine albumin creatinine ratio (UACR) testing, enables more accurate risk stratification and timely use of preventative therapies. It remains unacceptably low in the hypertension population. MethodsWe evaluated two EHR-embedded clinical decision support (CDS) strategies at Geisinger Health Syste...

5
A medically-grounded LLM agent-based tool to detect patient safety events in medical records
2025-12-18 health informatics 10.64898/2025.12.16.25342438
Top 0.3% (5.0%)
Show abstract

Large language models (LLMs) have shown incredible promise in medicine. While LLMs may be particularly useful in areas requiring extensive review of clinical records, their use remains limited due to their tendency to hallucinate and fabricate information. Hallucination issues, as well as their consequences, are exacerbated in low-probability, high-stakes scenarios such as rare adverse safety events or medical errors. We present SAFE-AI (Structured and Automated Framework for Explainable AI), a ...

6
Predicting Rectal Cancer Patient Survival with Dutch Radiology Reports using Natural Language Processing (NLP): The Role of Pretrained Language Models
2026-01-30 health informatics 10.64898/2026.01.23.26344428
Top 0.3% (5.0%)
Show abstract

The use of Electronic Health Records (EHRs) has increased significantly in recent years. However, a substantial portion of the clinical data remains in unstructured text formats, especially in the context of radiology. This limits the application of EHRs for automated analysis in oncology research. Pretrained language models have been utilized to extract feature embeddings from these reports for downstream clinical applications, such as treatment response and survival prediction. However, a thor...

7
Exploring Needs and Priorities in Digital Health Management for Rare Disease Patients and their Caregivers: A Mixed-Methods Study
2026-01-30 health informatics 10.64898/2026.01.28.26345095
Top 0.3% (4.9%)
Show abstract

Rare diseases affect millions worldwide and are associated with long diagnostic delays, limited access to treatments, and substantial challenges in daily care and coordination. Digital health technologies, including mobile apps, telehealth, and data-sharing platforms, offer opportunities to improve care and quality of life for people living with rare diseases. As these tools rapidly expand, this study examines the needs, expectations, and conditions for successful adoption of patient-centered di...

8
Transformer-based structuring of Italian electronic health records with application in cardiac settings
2026-01-23 health informatics 10.64898/2026.01.22.26344603
Top 0.4% (4.9%)
Show abstract

PurposeNatural Language Processing (NLP) has the potential to extract structured clinical knowledge from unstructured Electronic Health Records (EHRs). However, the limited availability of annotated datasets for algorithm training restricts its application in clinical practice. This study investigates the use of transformer-based NLP models to structure Italian EHRs in cardiac settings, addressing this gap. MethodsWe implemented and evaluated three named entity recognition algorithms: SpaCy, Fl...

9
Navigating the DiGA Jungle: A Taxonomy and Archetypal Framework of the German Digital Therapeutics Landscape
2025-12-30 health informatics 10.64898/2025.12.30.25343225
Top 0.4% (4.9%)
Show abstract

Digital therapeutics (DTx) are patient-facing apps designed to support individuals in their daily lives. Therefore, they have the potential to revolutionize healthcare by empowering and engaging patients to become active players in their own care. Despite the increasing adoption of DTx in national healthcare systems, research on their design remains limited. The present study introduces "DiGATax", a taxonomy designed to categorize and analyze DTx, including perspectives on content, intervention ...

10
Build fair machine learning models to predict adverse outcomes for Heart failure patients with preserved ejection fraction (HFpEF) and with reduced ejection fraction (HFrEF)
2025-12-19 health informatics 10.64898/2025.12.18.25342417
Top 0.4% (4.9%)
Show abstract

BackgroundHeart failure (HF), including heart failure with preserved ejection fraction (HFpEF) and heart failure with reduced ejection fraction (HFrEF), remains a major global health challenge, particularly among aging populations. Timely and accurate prediction of severe adverse outcomes associated with HF is critical for optimizing care, reducing disease burden, and improving outcomes. Although social determinants of health (SDoH) have been recognized as key drivers of HF disparities and assoc...

11
Clinical Med students' validation of Arkangel AI: Are their responses any better when supported by the AI?
2026-01-09 health informatics 10.64898/2026.01.07.25342560
Top 0.4% (4.8%)
Show abstract

IntroductionLarge Language Models (LLMs) in healthcare practice and education have been evaluated using medical question-answering (QA) datasets, with excellent performance. However, multiple-choice questions fall short when assessing more complex language interactions. ObjectiveTo evaluate the time invested and validity of medical students responses to clinical questions using ArkangelAI, compared to traditional search methods. MethodsRandomized, double-blind trial with clinical medical stude...

12
Augmenting Electronic Health Records for Adverse Event Detection
2026-02-11 health informatics 10.64898/2026.02.10.26345962
Top 0.5% (4.3%)
Show abstract

ObjectiveAdverse events (AEs) resulting from medical interventions are significant contributors to patient morbidity, mortality, and healthcare costs. Prediction of these events using electronic health records (EHRs) can facilitate timely clinical interventions. However, effective prediction remains challenging due to severe class imbalance, missing labels, and the complexity of EHR records. Classical machine learning approaches frequently underperform due to insufficient representation of minor...

13
Identifying Reasons for ACEI/ARB Non-Use in CKD Using Scalable Clinical NLP with Schema-Guided LLM Augmentation
2026-02-12 health informatics 10.64898/2026.02.10.26346025
Top 0.5% (4.2%)
Show abstract

IMPORTANCEAlthough angiotensin-converting enzyme inhibitors (ACEIs) and angiotensin receptor blockers (ARBs) are recommended for people with chronic kidney disease (CKD), they remain underused. Barriers to adherence, such as adverse effects or patient refusal, are frequently embedded within unstructured clinical narratives and are therefore inaccessible to structured data analytics. Scalable natural language processing (NLP) approaches are needed to identify these barriers and support guideline-...

14
Determinants of Digital Health Technology Acceptance Among Healthcare Caregivers: A Structural Equation Modeling Approach
2026-01-30 health informatics 10.64898/2026.01.28.26345025
Top 0.5% (4.2%)
Show abstract

BackgroundDigital health technologies, including artificial intelligence (AI)-powered tools and virtual reality (VR) interventions, are increasingly being deployed to support caregivers of patients with chronic conditions. However, the factors influencing caregiver acceptance of these technologies remain poorly understood. ObjectiveThis study aimed to develop and validate a structural equation model (SEM) to examine the determinants of digital health technology acceptance among caregivers of pa...

15
Development and validation of a generative AI-assisted medication-indication knowledge base
2026-01-06 health informatics 10.64898/2026.01.06.26343341
Top 0.5% (4.2%)
Show abstract

BackgroundExisting information resources about medicines and their indications have limited usefulness for health data analytics. The emerging potential of large language models (LLMs) to generate clinically accurate responses presents a novel opportunity to develop a comprehensive knowledge base of medicines and their clinical indications. MethodUnique medications from the English Prescribing Dataset (EPD) were extracted and included in a fine-tuned prompt pipeline using the GPT-4 and MedCAT L...

16
Assessing Multimodal AI for Visual Information Extraction of Pharmacology
2026-01-16 health informatics 10.64898/2026.01.15.26344119
Top 0.5% (4.2%)
Show abstract

While Americans are using herbal dietary supplements (natural products) more than ever, the consumption of natural products with prescription drugs can lead to harmful interactions. Pharmacovigilance of natural products depends on careful expert review and interpretation of a wide variety of evidence. In prior work, we demonstrated the value of knowledge graph (NP-KG) for assisting with natural product safety investigations. However, scaling the NP-KG from 33 natural products to the thousands on...

17
A Review of Point-of-Care Devices for Blood-Testing Towards AI-driven Remote Digital Care, Precision Healthcare and Predictive Medicine
2025-12-15 health informatics 10.64898/2025.12.13.25340658
Top 0.5% (4.1%)
Show abstract

Point-of-care (POC) blood testing enables rapid, decentralized diagnostics with transformative promise, yet its innovation landscape remains poorly mapped. To this end, we focused on features that we believe are key to make progress in areas of precision healthcare and predictive medicine, such as longitudinal data collection and data analytics integration. While no review can be complete, this work attempts to address this gap by analyzing 86 POC blood testing devices worldwide and proposing a ...

18
Drug or Pokemon? An analysis of the ability of large language models to discern fabricated medications
2026-01-13 health informatics 10.64898/2026.01.12.26343930
Top 0.6% (4.1%)
Show abstract

BackgroundThe use of large language models (LLMs) is increasing in the medical field; however, LLMs are often subject to "confabulations." Notably, LLMs have vulnerability to adversarial attacks, or fabricated details within prompts, which is concerning given both health misinformation and inadvertent errors in the medical record. This purpose of this study was to determine the effect of adversarial attacks by embedding one fabricated medication into a list of existing medicines. MethodsA total...

19
Patient Attitudes Toward Artificial Intelligence in Jordanian Healthcare: A Cross-Sectional Survey Study
2026-02-24 health informatics 10.64898/2026.02.22.26346852
Top 0.6% (4.1%)
Show abstract

Artificial intelligence (AI) is increasingly integrated into healthcare delivery, yet patient acceptance in resource constrained settings remains incompletely characterized. This study assessed attitudes toward AI supported care among patients attending hospitals in three Jordanian governorates (Amman, Balqa, Irbid) and examined demographic and digital literacy correlates of acceptance. In a cross sectional survey (n = 500 complete questionnaires), participants rated exposure to AI in healthcare...

20
Can Large Language Models Reduce the Cost of Extracting Data from Electronic Health Records for Research?
2026-01-11 health informatics 10.64898/2026.01.09.26343792
Top 0.7% (4.0%)
Show abstract

ObjectiveMuch medical data is only available in unstructured electronic health records (EHR). These data can be obtained through manual (human) extraction or programmatic natural language processing (NLP) methods. We estimate that NLP only becomes economically competitive with manual extraction when there are ~6500 EHRs records. We have found that there is interest from clinicians and researchers in using NLP on projects with fewer records. We examine whether a large language model (LLM) can be ...